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Abstract 

In this article, we propose some new generalizations of M-estimation procedures 
for single-index regression models in presence of randomly right-censored responses. 
We derive consistency and asymptotic normality of our estimates. The results are 
proved in order to be adapted to a wide range of techniques used in a censored 

m i 

regression framework (e.g. synthetic data or weighted least squares). As in the 

OO ' 

uncensored case, the estimator of the single-index parameter is seen to have the 
same asymptotic behavior as in a fully parametric scheme. We compare these new 

X 

estimators with those based on the average derivative technique of Burke and Lu 

0$ 

(2005) through a simulation study. 
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Kaplan-Meier estimator, single-index models. 



1 Introduction 

In regression analysis, one investigates on the function m(x) = E[Y \ X — x], which is 
traditionally estimated from independent copies (Yi,Xi)i<i< n G R 1+d . The parametric 
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approach consists of assuming that the function m belongs to some parametric family, 
that is m(x) = fo{0o,x), where fo is a known function and 8 an unknown finite di- 
mensional parameter. On the other hand, the nonparametric approach requires fewer 
assumptions on the model, since it consists of estimating m without presuming the shape 
of the function. However, this approach suffers from the so-called "curse of dimension- 
ality", that is the difficulty to estimate properly the function m when the dimension d 
is high (in practice, d > 3). To avoid this important drawback of nonparametric ap- 
proaches, while allowing more flexibility than a purely parametric model, one may use 
the semi-parametric single-index model (SIM in the following) which states 

m (x) = E[Y | X'6 = x'e Q ] = f (x?9 ; 9 ) , 

where / is an unknown function and 8q an unknown finite dimensional parameter. If 6 
were known, the problem would consist of a nonparametric one, but with the covariates 
belonging nevertheless to a one-dimensional space. 

In this framework, numerous semi-parametric approach have been proposed for root- 
n consistent estimation of 6q. Typically, these approaches can be split into three mains 
categories : M-estimation (Ichimura, 1993, Sherman, 1994b, Delecroix et Hristache, 
1999, Xia et Li, 1999, Xia, Tong, et Li, 1999, Delecroix, Hristache et Patilea, 2006), 
average derivative based estimation (Powell, Stock et Stoker, 1989, Hardle et Stoker, 
1989, Hristache et al., 2001a, 2001b), and iterative methods (Weisberg et Welsh, 1994, 
Chiou et Muller, 1998, Bonneu et Gba, 1998, Xia et Hardle, 2002). 

If the responses of this regression model are randomly right-censored, these approaches 
clearly need to be adapted, for the random variable Y is not directly observed. The right 
censoring model states that, instead of observing Y, one observes i.i.d. replications of 

T = Y AC, 

s = W, (1.1) 

where C is some "censoring variable", and 1a denotes the indicator function of the set A. 
In this setting, semi-parametric Cox regression model (see e.g. Andersen et Gill, 1982) 
can be seen as a particular case of the SIM model, but allows less flexibility. Moreover, 
it is still interesting to extend mean-regression models to the censored framework. For 
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this reason, Buckley and James (1978) proposed an estimator of the linear model under 
random censoring, and Lai and Ying (1991) and Ritov (1990) proved its asymptotic 
normality Koul, Susarla and Van Ryzin (1981) initiated what we may call the "synthetic 
data" approach, based on transformations of the data. See Leurgans (1987), Zhou (1992b) 
and Lai & al. (1995). Zhou (1992a) also proposed a weighted least-square approach, 
applying weights in the least square criterion in order to compensate the censoring. 
These techniques were then used in the nonlinear regression setting, that is when /o 
is known but nonlinear. Stute (1999) established a connection between the weighted 
least-square criterion and Kaplan-Meier integrals. Delecroix, Lopez and Patilea (2006) 
extended the synthetic data approach. Heuchenne and Van Keilegom (2005) modified 
the Buckley- James' technique for polynomial regression purpose. When it comes to the 
SIM model under random censoring, Burke and Lu (2005) recently proposed an estimate 
using an extension of the average derivatives technique of Hardle and Stoker (1989) and 
the synthetic data approach of Koul, Susarla, Van Ryzin (1981). 

In this paper, we propose a semi-parametric M-estimator of the SIM model under 
random censoring. We present a technique that is adapted to both main classes of cen- 
sored regression techniques (synthetic data and weighted least squares), deriving root-n 
consistency of our estimate of 8q, and then using it to estimate m(x). Another advan- 
tage of our technique is that we do not require that the covariates X have a density 
with respect to Lebesgue's measure (only the linear combinations 9'X need to be abso- 
lutely continuous), which is an important advantage comparatively with the estimation 
procedure of Burke and Lu (2005). 

The paper is organized as follows. In section [2] we present the regression model and 
our methodology. In section [3j we derive consistency of our semi-parametric estimates 
in a general form, asymptotic normality is obtained in section HI A simulation study is 
presented invito test the validity of our estimate with finite samples. Section is devoted 
to technical proofs. 
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2 Model assumptions and methodology 

In the following, we assume that we have the following regression model, 

Y = f(9' X;9 )+e, 

where 8q is a vector of first component equal to 1, and E [e | X] — 0. The function / is 
defined in the following way, / (u; 9) = E [Y | X'9 = u] . Considering the censoring model 
(11. II) . we will define the following distribution function, 

F(t) = F(Y<t), 

G(t) = F(C<t), 

H(t) = P(T<t), 

F iXtY )(x,t) = F(Y<t,X<x). 

In the following, we will assume that 

M{t,F(t) = 1} = ini{t,H(t) = l}, (2.2) 
P(y = C) = 0. (2.3) 

Otherwise, if (12.21) does not hold, since some part of the distribution of Y remains unob- 
served, consistent estimation requires making additional restrictive assumptions on the 
law of the residuals. Note that, in this case, our estimators will still be root-n convergent, 
but not necessary to 9$. Concerning fl 2 . 3 j) . we use this assumption to avoid dissymetry 
problems between C and Y. 

As a property of conditional expectation, for any function J(-) > 0, we have 

# = argminE \(Y - f(6'X;6)) 2 J(X)] = argmin M (#, /) (2.4) 



eee 



argmin / (y - f (d'x; 6)f J(x)dF {x> Y) fa v) 



In equation (12.41) . of course we can not exactly know 8q, since two objects are missing in 
the definition of M, that is the distribution function F(x,y) and the regression function 
/ (9'x; 9). A natural way to proceed consists of estimating these two functions, and then 
plugging in these estimators into (12 .4p . 



2.1 Estimating the distribution function 

We already mentioned there are two main approaches for studying regression models in 
presence of censoring, the Weighted Least Square approach (WLS in the following) and 
the Synthetic Data approach (SD in the following). 

The WLS approach. In the uncensored case, the distribution function Ftx,Y) can 
be estimated using the empirical distribution. This tool is unavailable under random 
censoring, since it relies on the (unobserved) (Yi)i<i< n - Under random censoring, Stute 
(1993) proposed to use an estimator based on the Kaplan-Meier estimator of F. Recall 
the definition of Kaplan- Meier estimator, 

Si 



p(t) = i- n Ls=iwl 



where H denotes the empirical distribution function of T. F can be rewritten as 

where Wi n is the jump at observation %. It is particularly interesting to notice that the 
jump at observation % is connected to the Kaplan-Meier estimate of G at the same value 
(see, for example, Satten and Datta, 2000), that is 

W m = 5 . (2.5) 

Kaplan-Meier estimate is known to be a consistent estimate of F under the two following 
identifiability assumptions, that is 

Assumption 1 Y and C are independent. 
Assumption 2 P (Y < C | X, Y) = P (Y < C \ Y) . 

A major case for which Assumptions [Ul2] hold is the case where C is independent 
from (Y, X). However, Assumption [2] is more general and covers a significant amount of 
situations (see Stute, 1999). 

The SD approach. The SD approach consists of considering some alternative 
variable which has the same conditional expectation as Y. For this, observe that, through 
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elementary calculus, under Assumptions [TJEl 

5<j>{X,T) 



X 



E[<f>{X,Y)\X]. (2.6) 



1-G(T- 

From (12.61) . we see that, if we define, accordingly to Koul & al. (1981), 

y* ST 

' l-G(T-)' 

we have E [Y* | X] — E \Y | X] under Assumption [1] and [2j Hence, if Y* were available, 
the same regressions techniques as in the uncensored case could be applied to Y*. Of 
course, Y* can not be computed, since it depends on the unknown function G. But Y* 
can be easily estimated (which is not the case for Y) by replacing G by its Kaplan-Meier 



estimate. For i = 1, ...,n we obtain 



Y* 5iTi 



l-G(T-) 

See also Leurgans (1987), Lai & al. (1995) for other kind of transformations. 
Back to equation (12.41) . the SD approach will first consists of observing that 

# = argminE \(Y* - f (9'x;9)f J(X)] = M*(6J) (2.7) 



argmin J (y* - f (9'x; 6)f J(x)dF ( 



(X,Y*) \ X iV ) J 

where Ffc Ym) (x, y*) = F (X < x,Y* < y*) . 

Note that M* and M are not the same functions. Indeed, Y* happens to have the 
same conditional expectation as Y (hence M and M* have the same minimizer 6 ), but 
it has not the same law. 

2.2 Estimating / (9'x; 9) 

In the uncensored case, a common non-parametric way to estimate a conditional expecta- 
tion is to use kernel smoothing. In this case, the Nadaraya- Watson estimate for / (9'x; 9) 
is 



f(9'x;9) 



JyK( e -^) dF emp (u,y) 
JK( 9 -^) dF emp (u,y) ' 
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We are still facing the same problem of absence of the empirical distribution function. 
However, WLS and SD approaches can be used to extend the Nadaraya-Watson estimate 
to censored regression. In the following, we will only use the SD approach of Koul & al. 
to estimate the conditional expectation, that is 

f C*; o) = ^ n J <{e , x J x) -. (2.8) 

While using this estimator, we do not have to deal with Kaplan-Meier integrals at the 
denominator. In fact, the integral at the denominator becomes an integral with respect to 
the empirical distribution function of X . However, alternative estimates (not necessarily 
kernel estimates) can still be used, provided that they satisfy some further discussed 
conditions to achieve asymptotic properties of 0. Therefore we chose to present our 
results without presuming on the choice of / (O'x; 9), and then to check in the Appendix 
section that the estimator defined in (I2.8P satisfies the proper conditions. 

Also observe that, using this kernel estimate, contrary to the average derivative tech- 
nique of Burke and Lu (2005), we do not need to impose that X has a density with 
respect to Lebesgue's measure. We only need that the linear combinations O'X do. 

The choice of the trimming function J. The reason why we introduced the 
function J in (12. 4p appears in the definition (12.81) . To ensure uniform consistency of 
this estimate, we will need to bound the denominator away from zero. For this, we will 
need to restrain the integration domain to a set where fe'x{u) is bounded away from 
zero, fg/x denoting the density of O'X. If we were to know 0q, we could consider a set 
Bq = {u : fe' x( u ) > c} for some constant c > 0, and use the trimming J(0' Q X) = lo' xeB - 
Of course, this ideal trimming can not be computed, since it depends on the unknown 
parameter 0q. Delecroix & al. (2006) proposed a way to approximate this trimming from 
the data. Given some preliminary consistent estimator n of 0q, they use the following 
trimming, 

j n {o' n x) = i f<xKX) > c . 

In the following proofs, we will mostly focus on the estimation using the uncomputable 
trimming J(0' o X), and we will show in the appendix section that there is no asymptotic 
difference in using J n (0' n X) rather than J(0' o X). 

7 



2.3 Estimation of the single- index parameter 

Preliminary estimate of 9 . For a preliminary estimate, we assume, as in Delecroix 
& al. (2006) that we know some set B such as mi x( zB,eee{fe'x(9'x) > c > 0}, and we 
consider the trimming function J(x) = l x eB- To compute our estimate 9 n , we then can 
use either of the WLS or SD approach. For example, using the WLS approach, let 

n = argmin J (y - f (O'x^tf J{x)dF {XX) (x, y) = argmmM£(0, /). (2.9) 

Estimation of 9q. In view of (12.41) and (12.71) . we will define our estimates of 9q 
according to the two regression approaches discussed above, 

Owls = argmin / y-f(9'x;9) J n {6' n x)dF {x ,Y) {x,y) 

o£6„ J L 

= argminMf L5 ^,/), 



'SD = arg mm 



y*-f(0'x;9) * J n (9' n x)dF*(x,y*) 



arg min M% D (ej). 



6»ee„ 

In the definition above, for technical convenience, we restrained our optimization to 
shrinking neighborhoods Q n of 8 , chosen accordingly to the preliminary estimation by 

On- 

2.4 Estimation of the regression function 

With at hand a root-n consistent estimate of 8q, it is possible to estimate the regression 
function by using 9 and some estimate /. For example, using / defined in (12.81) will lead 
to 

f(9'x;9^~ V 1 



If ( O'Xi-0'x 

2^i=i K {-^h — 
3 Consistent estimation of #q 

In this section, we prove consistency of 9 n where 9 n is defined in (12.91) . As a consequence, 
9 is consistent since it is obtained from minimization other a shrinking neighborhood of 
#o- We will need two kinds of assumptions to achieve consistency : general assumptions 
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on the regression model including identifiability assumptions for 9 , and assumptions on 
/■ 

Identifiability assumptions for 9q and assumptions on the regression model. 
Assumption 3 EY 2 < oo. 

Assumption 4 // M(9 X , f) = M(6 Q , /), then 9 X = 6 . 

Assumption 5 and X = Supp(X) are compact subsets ofW* and f is continuous with 
respect to x and 6 . Furthermore, assume that \f (9[x;9i) — f (9' 2 x; #2) I — \\9i — $2|| 7 $ (X), 
for a bounded function $ {X), and for some 7 > 0. 

Assumption [3] is implicitly needed in order to define M, while Assumption H] ensures 
the identification of the parameter 9q. On the other hand, Assumption [5] states that 
the class of functions T — {./ (9'.; 9) ,9 £ 0} is sufficiently regular to allow it to satisfy 
an uniform law of large numbers property. More precisely, Assumption [5] ensures that 
this class is Euclidean for a bounded envelope, according to Pakes and Pollard (1989). 
Observe that the condition that $ is bounded can be weakened, by replacing it by a 
moment assumption on $. However, this condition is quite natural in a context where 
we will assume that the covariates are bounded random vectors, and this will simplify 
our discussion. Moreover, it implies that / is a bounded function of 9 and x. 

Assumptions on /. 

Assumption 6 For all function g, define, for c > 0, 

IM|oo= sup \g(9'x]9)\l h{ 0'x)>c/2- 

eee,x 

Assume that \\f — fW^ = op(l). 

See section O for more details to see that the kernel estimator (12. 8p satisfies this 
assumption under some additional integrability assumptions on the variable Y. 

Theorem 3.1 Under Assumptions & to® we have 



sup 



M n {9j)-M QO {9J) 



op a: 



As an immediate corollary, in a probability sense, 9 n —> 9 . 



Proof. 

Step 1 : replacing / by /. Observe that, since the integration domain is restricted 
to the set B, 

M n (9,f)-M n (9,f) 

x[\\f + f\\ooj dF (xx) (x,y) + 2 J \ydF {xx) (x,y)\}. 

Now using Assumption [6j deduce that sup eee \M n (0, f) — M n (8, f)\ = op(l). 

Step 2 : M n (6,f). Showing that sup ege \M n (9,f) - M(6,f)\ = o P (l) can then be 
done in the same way as in a nonlinear regression model such as in Stute (1999). See the 
proof of Theorem 1.1 in Stute (1999). ■ 

4 Asymptotic normality 

As in the uncensored case, we will show that, asymptotically speaking, our estimators 
behave as if the true family of functions / were known. Hence studying the asymptotic 
normality of our estimates reduces to study asymptotic properties of estimators in a 
parametric censored nonlinear regression model, such as those studied by Stute (1999) 
and Delecroix & al. (2008). We first recall some elements about the case "/ known" 
(which corresponds to a nonlinear regression setting), and then show that, under some 
additional conditions on / and on the model, our estimation of 9q is asymptotically 
equivalent to the one performed in this unreachable parametric model. 

4.1 The case / known 

This case can be studied using the results of Stute (1999) for the WLS approach, or the 
results of Delecroix & al. (2008) for the SD approach. We recall some assumptions under 
which the asymptotic normality of the corresponding estimators is obtained. 

Assumptions on the model. We denote by Vg/(x; 9) the vector of partial deriva- 
tives of (x, 9) — > f(9'x; 9) with respect to 9, and Vg/ the corresponding Hessian matrix. 



< 11/ -/II 
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Assumption 7 f(9'x; 9) is twice continuously differentiable with respect to 9, and Vgf 
and Vjj/ are bounded as functions of x and 9. 

Assumptions on the censoring. We need some additional integrability condition. 
We first need a moment assumption which is related to the fact that we need to have 
E[Y* 4 } < oo. 

Assumption 8 

f V l dF(y) _ 
J [1-G&-)1 S< °°- 

Actually x is not involved in Assumption [8] as it is assumed to be bounded. Furthermore, 
in the case / known, this assumption can be weakened, but it will be needed in the case 
/ unknown to obtain uniform consistency rate for /. The following assumption is used 
in Stute (1995, 1996) to achieve asymptotic normality of Kaplan-Meier integrals. 

Assumption 9 Let 

dG(s) 

{l-H(,)}{l-G(s)Y 
Assume that 

j yC l / 2 {y)dF {XtY) (x,y)<^. 

See Stute (1995) for a full discussion on this kind of assumption. Using our kernel 
estimator for estimating the conditional expectation will lead us to a slightly stronger 
assumption (see the appendix section), which is 

Assumption 10 For some e > 0, 

JyC 1 / 2+£ (y)[l-G(y-T l dF {x , Y) (x,y)<oo. 

In the following, we will use the (slightly) stronger Assumption [TOl since it may simplify 
some proofs (see Lemma [6.21 and the proof of Theorem 14.11) . However, Assumption [TD] 
could be replaced by Assumption [9] if we were to use an estimator (not necessarly kernel 
estimator) which would not require Assumption [10] to satisfy the proper convergence 
assumptions. Note that this kind of assumption is classical in studying regression models 
with censored responses. Although it is not mentioned in Burke and Lu (2005), a similar 
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assumption is implicitly needed to obtain equation (2.29) of Lai & al. (1995). In their 
proof of Lemma A. 7 page 199 of Burke & al. (2005), the authors refer to equation (2.29) 
page 275 of Lai & al. (1995): this only holds under the condition C3 of Lai & al. (1995) 
which basically controls the tail behavior of the distributions. 

The following Theorem can be deduced from the proof of Theorem 1.2 in Stute (1999) 
and of Theorem 4 in Delecroix & al. (2008). However, to make this article self-contained, 
a short proof of this result is postponed at section 16.11 of the appendix. 



Theorem 4.1 Define 
and let 



:i - g it>, 

\-H{T-) 



^T>y,y>ydG (v) 

[1-H(v)] 2 



U 



wls HT-f (e'oX; 6 )) 



u 



SD 



1 — G(T- 
5T 

l-G(T-) 



{y-f (9' x; 9 )}V(y, T, 5)dF {xy) (x, y) 



yV(y,T,5)dF (x ,Y) (x,y) 



and let W WLS = E 



[U WLS ) 2 



(u SD y 



Let M„ and M^, denote re- 



and W SD = E 

spectively either M 1 ^ and M, or M^ D and M*. We have, under Assumptions [J\ to 

M 



\e-0o\ 



Op (\\9 - 9 \\ 2 ) + R n , (4.10) 

(4.11) 



6 )' ^ + o P (n- 1 ) + R 
\/n v ' 



M n (9J) = M oo (6,f) + P 

M n (9J) = ^(9-9 )'V(9-9 ) 
where R n does not depend on 9, where 



V = E[V e f(X;6 )\7 e f(X;6 )'], 

and where W n J\f(0 t W), for W = W WLS and W = W SD in the WLS-case and 
SD—case respectively. 



In view of Theorem 1 and 2 of Sherman (1994), (14.101) states that, in the case where 
/ is known, \9 - 9 \ = P (r^ 1 / 2 ), while fl^TTTD gives the asymptotic law of 9, showing 
that n l ' 2 (9 - 9 ) => Af(0, V^WV' 1 ), in both WLS and SD cases. 
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4.2 The case / unknown 

As / is unknown in the SIM model, we need to add some conditions about the rate of 
convergence of /. 

Assumptions on /. If we evaluate the function Vgf(x;9) at the point (x,9q), a 
direct adaptation of Lemma A. 5 of Dominitz and Sherman (2003) shows that 

V*/(s; 9 ) = f'{9' x){x - E[X \ 9' X = 6' x]}, (4.12) 

where /' denotes the derivative with respect to t of the function f(t; 9q). 

Assumption 11 We assume that the function f(t;9o) is continuously derivable with 
respect to t, its derivative is denoted as f and is bounded. 

We will also assume some regularity on the model. 

Assumption 12 u — > f(u;9o) where u ranges over 9' X is assumed to belong to some 
Donsker class of functions T. 

In our minds, T will be the class C 1 (9' X, M), that is the class of functions <fi defined on 
9qX and being one time differentiable with ||0||<x> + ||0'||oo < M (see section 2.7 in Van 
der Vaart and Wellner, 1996). It is important not to impose to much regularity on the 
regression model, since, as we will see it in Assumption [T31 / will also be required to 
belong to this class with probability tending to one. 
Assumptions on /. 

Assumption 13 With probability tending to one, u — > f(u; 9q) G T where T is defined 
in AssumptionUB. Furthermore, 

||V fl /-Ve/|| 00 = op(l) J (4.13) 

and, defining W* = ^n"^! - G(T— 



sup 

eee n 



Y J W:J{9' Q X l )[T l - f{9' Q X i] 9 )}[V e f(X i] 9 ) - V fl /(X i; 9 )\ 



i=l 



sup 



J2w*J(9' X i )(f(9' X l] 9 ) - r(9' X t ;9 ))(V e f(X t] 9) - V e /*(X^)) 



i=l 



(4.14) 

--o P (n- 1/2 )- 
(4.15) 
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We can now enounce our asymptotic normality theorem. 



Theorem 4.2 Under Assumptions to [TS[ we have 



V^(o SD -do) => Af (OtV^W^V- 1 ) 



Proof. First apply Proposition 16.91 to obtain that J n (9' n Xi) can be replaced by 
J(6' Xi) or by lf g (o'Xi)>c/2, plus some arbitrary small terms which will not be mentioned 
in the following. Moreover, we consider 9 G O n which is an op(l) — neighborhood of 9 . 

Proof for the WLS approach. Using the representation (I2.5P of the Kaplan-Meier 
weights, 



M n [ej 



M ( ft n 2^ 6 i J(ff X i ) (71-/(61%; 61)) 
x [f(9'X l ;9)-f(9X i ; 



M„ 



G {Ti- 
ft - 2A ln + B ln . 



f{e'x i -e)-f{e'x i - 



First decompose A\ n into four terms, 

l^SiJWXi) (T i -f(9' X i ;9 )) 



A ln 



n 



/ (Q'oXi; ) - f (6 Xi] 9 



l-G(r-) 
| djjje^Q (f(e f x i; e )-f(e l x i ;9)) 

l-G(Ti-) 

x / (9'X t ; 9)-f (9'Xi, 9) - f 9 ) + / 9 ) 

, 8 l J(9' X l )(f(9' X l ;9 )-f(9'X, l ;9)) 



l-G(Ti-) 
6 i J{e , X i ){T i -f{9' X i ;e )) 



f (Q'oXi; 9 ) - f (9 Xi; 9 



l-G(T-) 

/ (6?%; 0) - / (0%; 0) - / (9' X i; 9 ) + / 9 ) 



yl2n does not depend on 9. 

For A 3n , use Assumption [5] to bound f{9' X; 9 )-f(9X; 9) by M x ||6> — 6> || (for some 
constant M > 0) using a Taylor expansion. Using a Taylor expansion, the bracket in A 3n 
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can be rewritten as {9 — 6q)'\*7 ef(X; 9) — Vef(X; 9)] for some 9 G Q n . Moreover, using 



Proposition 16. 9\ we can replace J{9' X) by 1 



{h(9'X)>c/2} 



Hence we have 



A 3n < M\\9-9 \\ 2 sup \Vef(x;8)-V e f(x; 

eee,xex 



dF {x ,Y){x,y). 



The uniform consistency of Vg/ in Assumption [T3l shows that A 3n = op(\\9 — ^H 2 ). 

For A^n, use a second order Taylor expansion and the uniform consistency of / to 
obtain 



.4 



N go) WpQ;0 o ) f(9' X i ;9 )-f(9' X i ;9 ) (4.16) 



G(T, 



+o P (\\9-9 \ 



(4.17) 



In the first term, first replace G by G. Using Lemma [6.21 ii) with rj — 1, this introduces 
a remainder term which is bounded by 

Op(\\9 - flo||"~ 1/2 )ll/ - /lU ^ 5 l J{9' X l )C^{T i -) 

2_. l-G(T-) 



where we also used the boundedness of Vgf. Using the uniform consistency of / shows 
that replacing G by G in (I4.16P only arises an opQI^ — 9 \\n~ 1 / 2 ) term. Now, we will use 
the regularity assumption (1121) on /(•; 9q). If the class T is Donsker, the class of function 
r = (5,T,X) -> 5J(9' X)[l-G(T i -)]- 1 Vgf(X i ] 9 )J 7 (9' X i ) is Donsker, from a stability 
property of Donsker classes (see e.g. Van der Vaart and Wellner, 1996). The notation 
jF(#gXj) is used to mention that the functions in T are evaluated at 9' Xi. Furthermore, 
for all G J-', E[(p(Ti, 5i, XA] = 0, since 



E 



L 1 - G{Ti—) WoX \ 



E[Vof(X i ;6 )\e , X i ] = 0, 



from (I4.12p . Hence, using the fact that /(•; 9 ) G T with probability tending to one, and 
the asymptotic equicontinuity property of Donsker classes for T' (see Van der Vaart and 
Wellner, 1996), we obtain 



i=i 



G(T r 



op(\\9 — 9 \\n 



-l/2> 



and finally, A± n = op(\\9 — 9o\\n 
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Similarly, for A$ n , a Taylor expansion yields 

A-5 n = 



11 



i=l 



l-G(Ti-) 



-0 



n 



-E 



i=l 



l-G(Ti 



x 



v e f[x i -e)-v e f[x i -e 



o P (\\9-9 \\n^ 2 ), 



where, as for A in , we replaced G by G by using Lemma IBT^l ii) and the uniform consistency 
of V g f. Now We then obtain A 5n = o P {\\0 - o ||n~ 1/2 ) + o P (\\9 - 9 \\ 2 ) using condition 
14.151 in Assumption [TBI 
For Bi n , write 

5iJ(9' Xi 

i=l 



B 



In 



n 

n I 

„' 1 A 



G(T 4 -) 

x [/ (0%; 9)-f {9'X, - 9) - f (9> X t ; 9 ) + / (9' X-, O ) 
Si J (9' Xi 



l-G(T~) 
l-G(T~) 



f {Q'oXi, 9 ) - f (9' Xi] 9 ) 
f {Q'oXi, O ) - f {9 Q Xi\ 9 ) 



f (9'X,; 9)-f (0%; 9) - f (0 O X,; 9 ) + / (9' X i; 9 ) 

Using a second order Taylor expansion and arguments similar to those used for A 3n , we 
obtain that the first term is of order op(\\9 — 9 \\ 2 ). The second term does not depend 
on 9. For the third, a first order Taylor expansion shows that it is bounded by 



\\9-9 \\\\Vef-Vef\\oc sup \f(9' x;9 ) - f(9' x;9 )\ dF {x , Y) (x,y). 

x:J(8' x)=l J 

Now condition I4.14l in Assumption [TBI shows that this is op(||0 — ^o||^ -1 ^ 2 )- 
We have just shown that 



Mn[9,f)=M n (9,f) + o P 



||0-0o 



ii 



o P (||0-0 O || 2 ) 



on a set of probability tending to one. Furthermore, using (I4.10p we deduce ||0 — 0o|| = 

Op (n -1 / 2 ) from Theorem 1 in Sherman (1994), and since, from gjTj), on P {n' 1 / 2 ) —neighborhoods 

of O , 



he- 9 )' V(9-9 ) + ^(9- O )' W™ LS + o P (n- 1 ) 
2 Jn v ' 
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we can apply Theorem 2 of Sherman to conclude on the asymptotic law. 

Proof for <f) SD . Proceed as for <j) MC , the only difference is in the fact that G does 
not appear in the terms where T does not appear at the numerator. ■ 

5 Simulation study 

In this section, we tried to compare the behavior of our estimator with the estimator pro- 
posed by Burke and Lu (2005) who used the average derivative technique. We considered 
three configurations. 



Config 1 Config 2 Config 3 



e ~ JV(0,2) 


e~JV(0,l) 


e ~ W(0, 1/16) 


X ~ W[-2;2] <8>W[-2;2] 


X ~ W[0; 1] ® ZY[0; 1] 


X~B(0.6)®W[-1;1] 


f(9'x;9) = l/2(9'x) 2 + l 


/(^) = C^ 


/(0'z;0) = l + 0.1(#'x) 2 
-O.2(0'a; - 1) 


= (1,1)' 


0o =(1,2)' 


O =(1,2)' 


C~ U[0, Ai] 


C ~ £(A 2 ) 


C ~ f (A 3 ) 



The first configuration is used by Burke and Lu (2005) in their simulation study. 
Observe that, in this model, (I2.2p does not hold (this condition (12.21) is also needed in 
Burke and Lu's approach), but it only introduces some asymptotic bias in the estimation. 
In the second configuration, there is no such problem since C is exponential. In the third 
configuration, we see that X does not have a Lebesgue density, but Q'X does. In this 
situation, it is expected that the average derivative techniques does not behave well since 
it requires that X has a density. 

In each configuration, we simulated 1000 samples of different size n. For each sample, 
we computed 9wls, @sd, and 9 ad which denotes the average derivative estimate computed 
from the technique of Burke and Lu (2005). We then evaluated \\9— #o|| 2 for each estimate, 
in order to estimate the Mean Squared Error (MSE) — #o|| 2 ]- We used different values 
of the parameters Aj to modify the proportion of censored responses (15%, 30%, and 50% 
respectively). Results are presented in the table below. 
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Config 1 




n = 30 




7i = 50 




n = 100 




Ai — 2 4 


L/ 


4 86^6 x 1 0" 


-2 


2 6822 x 1 0" 


-2 


1 1 733 x 1 0" 

X . X 1 Uu /N X u 


-2 




/jWXS 

[7 


1 2814 x 10" 

X . ^ O X t: /N X \J 


-4 


4 0350 x 1 0" 

t: . V 7 (J '.XV/ /S lu 


-5 


2 0694 x 10" 


-5 






1 2200 x 10" 


-1 


8 3869 x 1 0" 

U.uUUJ /N X IX 


-5 


1 3820 x 1 0- 


-5 


Ai — 1 1 7 


QAD 


4 5757 x 1 0" 


-2 


3 3285 x 1 0" 


-2 


1 8236 x 1 0" 


-2 




qWLS 


1 5713 x 10" 


-4 


3 8088 x 10" 

t_l • v_J W v_J v_J /> X ix 


-5 


2 9482 x 10" 


-5 




qSD 


1.6925 x 10" 


-4 


4.0177 x 10" 


-5 


1.9924 x 10" 


-5 


Ai = 0.1 


qAD 


1.0102 x 10" 


-1 


7.4870 x 10- 


-2 


5.0438 x 10" 


-2 




qWLS 


8.3666 x 10" 


-4 


1.3010 x 10" 


-4 


3.7669 x 10" 


-5 




qSD 


1.2000 x 10" 


-3 


6.7356 x 10" 


-5 


2.3650 x 10" 


-5 


Config 2 




n = 30 




n = 50 




n = 100 




Ao — 2 


qAD 


4 1260 x 10" 


-1 


3 6920 x 1 0" 

U iUi7iJU s\ ±.\J 


-1 


3 41 51 x 1 0" 

. t: X '.J X s\ XW 


-1 




qWLS 


7 8201 x 10" 


-3 


6 5401 x 1 0" 

\J . tX j: \J X /\ xw 


-3 


5 8660 x 1 0" 

U.wUUU /N X W 


-3 




QSD 


1 8296 x 10" 


-2 


1 4721 x 10" 

X . ^ 1 £-i X /\ X w 


-2 


1 1 034 x 1 0" 

X . X \X O j: /N X \J 


-2 


Ao — 1 

/\'2 — u . x 


QAD 


3 5199 x 10" 


-1 


3 3522 x 10" 


-1 


2 8713 x 10" 


-1 




qWLS 


1 2301 x 10" 

X .ZjtJW 1 /\ xw 


-2 


7 8301 x 10" 


-3 


7 7180 x 10- 

1 . 1 X UU /N X W 


-3 




qSD 


2.0822 x 10" 


-2 


2.0301 x 10" 


-2 


1.9741 x 10" 


-2 


A 2 = 0.05 


qAD 


1.6238 




1.5553 




1.5223 






QWLS 


1.6312 x 10" 


-2 


1.5100 x 10" 


-2 


1.2013 x 10" 


-2 




QSD 


3.0344 x 10" 


-2 


2.7057 x 10" 


-2 


2.2510 x 10- 


-2 


Config 3 




n = 30 




n = 50 




n = 100 




Aq — 1 1 

A3 — ±± 


QAD 


> 10 




> 10 

^ X \J 




> 10 






QWLS 


4 1896 x 10" 


-4 


3 1 530 x 1 0" 


-4 


1 7453 x 1 0" 


-4 




QSD 


4 621 8 x10" 

X.UZj 1(J /\ xu 


-1 


1 8696 x 1 0" 

X.OUC/U /> xu 


-4 


1 ^286 x 1 0" 


-1 


Aq — 4 


QAD 


> 10 




> 10 

^ X w 




> 10 






qWLS 


9 1 584 x 1 0" 


-4 


3 3124 x 10" 


-1 


2 8984 x 10" 

. O iX O j: /N XW 


-4 




QSD 


3 4912 x 10" 


-1 


2 3344 x 10" 


-1 


2 2457 x 10" 


-1 


A 3 = 2 


qAD 


> 10 




> 10 




> 10 






qWLS 


2.0159 x 10" 


-2 


1.1431 x 10" 


-2 


2.4111 x 10" 


-4 




QSD 


9.0591 x 10" 


-4 


2.0668 x 10" 


-4 


1.9921 x 10" 


-4 
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Globally, the performance of the different estimates shrinks when the proportion of 
censored responses increases. Performances of Q WLS and 9 SD are globally similar. In all 
tested configurations, 6 WLS and 6 SD seem to perform better than 6 . As expected, in 
the situation where X does not have a density, 6 AD does not converge. 



6 Appendix 

6.1 Some results on Kaplan-Meier integrals 

In this section, we recall some facts on the behavior of Kaplan-Meier integrals. First part 
of this section is devoted to the i.i.d representation of Kaplan-Meier integrals derived by 
Stute (1995, 1996), first in the univariate case, then in presence of covariates. For this, 
define, for any function <p, 



Ui(4>) = j ^(x,y)^(y,T i ,5 i )dF (X; y ) (a:,y), 

where ip has been defined in Theorem 14.11 It can be easily shown that E [Ui (</>)] = 0. 
The following Theorem has been derived by Stute (1996). 

Theorem 6.1 Let <p be a function satisfying 

H^V)\C 1 ' 2 (y)dF (xx) (x,y)<oo. 



Then 



5i(/)(Xi,Ti] 



/l $ 
<f>( X ,y)dF {x>Y) ( X ,y) = -Er- G (TH 

n 

+ 1 -J2u i ^) + o P (n-^). 



n . 

i=i 



In view of the expression (12. 5p of the jumps of Kaplan-Meier estimate, this Theorem 
shows that, asymptotically, these jumps can be replaced by the "ideal" jumps, say W* = 
n _1 <5i[l— G (T— )]~ 1 , plus some perturbation that only appears in the study of the variance 
(since its expectation is zero). The following lemma gives some additional precision on 
the difference between the jumps W in and the "ideal" jumps W*. 
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Lemma 6.2 Recall that G is the Kaplan-Meier estimator for the distribution ofC, Wi n = 
n~ 1 5i[l— G(Ti— )]~ x andW* = <5i[l — — and denote byT^ the largest observation. 

sup 1 ~ G ri~\ = and sup \ - G{ *~\ = P (1) ; (6.18) 

t<T (n) 1 - G[t~) t <T {n) 1 - G(t~) 

ii) For all < t] < 1 and e > , 

\W in - W*\ < W*{C {T t )Y [1/2+£] x P (n-^ 2 ) , (6.19) 
where the Op (n"^ 2 ) factor does not depend on i. 
Proof. 

i) The first part of (I6.18P follows from Theorem 3.2.4 in Fleming and Harrington 
(1991). The second part follows for instance as a consequence of Theorem 2.2 in Zhou 
(1991). 

ii) Fix r] > arbitrarily. Since f TH C~ 1 ~ 2r '(y)dC(y) < oo, for some a > 0, apply 
Theorem 1 in Gill (1983) to see that 

sup [C(y)r l/2 ^\Z(y)\=0 P (l), (6.20) 

where Z = ^Jn{G — G}{1 — G}^ 1 is the Kaplan- Meier process. Next, the proof can be 
completed by using the definitions of W in , W*, property (16.181) . and elementary algebra. 
■ 

6.2 Proof of Theorem ED 

In this section, we show that the criterion and M^ D satisfy the conditions (I4.10p 

and (14.111) . The same properties can be also shown for the synthetic data estimators of 
Leurgans (1987) and Lai & al. (1995). More precisions can be found in Delecroix & al. 
(2008). For the sake of simplicity, we only prove it for M^ LS since the proof for M^ D 
uses similar arguments. 
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Proof for MYf LS . Write 



M™ LS (9J)-M(9) 



2 J (y-f (9' x; 9 )) {/ (9' x; 9 ) - f (9'x; 9)} 
xd(F {x ,Y) - F (XX ))(x,y) 
+ J{f(9' x;9 )-f(9'x;9)} 2 

xd(F (x ,Y) ~ F(x,Y)){x,y) 

+ / (y-f(9' Q x;9 Q )fd{F ix ,Y)-F ix , Y) )(x,y). (6.21] 



The last term does not depend on 9. Let 



x(x,y) = {y-f(9' x;9 )}V e f(x;9 Q ). 



Using the derivability Assumption [7] and Theorem 16. 1\ the first term in the right-hand 
side of ( IQTj) is 



(x,y) - F (xx) ) fa y) 



2(e -ey J x(x, y )d(P ( 

+2(9 -9)' J{y-f(9' x;9 )}V 2 9 f(x; 
x d(F (x>Y) -F {x>Y) ){x,y)] (9 -9) 



2(9 



+2 (0 O 



»' IE 

L i=i 



GiT, 



E 



Sx(X,T) 
1 -G(T-) 



,/ 1 



n 



(6.22) 



t=i 



where the op-rate comes from the boundedness of Vjj/ and consistency of Kaplan-Meier 
integrals. Furthermore, the empirical sums in (16.221) weakly converge to centered Gaus- 
sian variables at rate Op(n _1//2 ). For the second term in (I6.2ip . rewrite it as 



0nV 



Vef[9x;9)Vef[9x;9 



d(F {x>Y ) - F {x>Y ))(x,y) 



9n 



From the boundedness of Vg/ , deduce that this is op (||6> — 6*0 1 1 2 ) - We thus obtained 
gUD). To obtain (OT]L use Theorem EU 



6.3 Properties of / 



In this section, we derive some properties of / defined by (12.81) . and show that this 
estimate satisfies Assumptions [6] and [13j Our approach consists of comparing / to the 
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ideal estimator /* defined as 

f* (O'x; 6) = ^ 6 - 23 ) 

that is the estimator based on the true (uncomputable) Y*. Indeed, /* is a regular kernel 
estimator based on uncensored variables, and can be studied by traditional nonparametric 
kernel techniques. 

Assumptions on the random variables X'9. 

Assumption 14 For all 9 G 0, O'X has a density which is continuously derivable, with 
uniformly bounded derivative. 

Assumptions on the kernel function. 

Assumption 15 • K is symmetric, positive, twice continuously differentiable func- 
tion with K" satisfying a Lipschitz condition. 

• / K(s)ds = 1. 

• K has a compact support, say [— 1; 1]. 
Assumptions on the bandwidth. 
Assumption 16 • nh 8 — > 0. 

• n/^pogfa)] -1 = 0(1). 

The first Lemma we propose allows us to obtain uniform convergence rates for the 
ideal estimator /* as an immediate corollary. 

Lemma 6.3 Let K be a kernel satisfying AssumptionXTh\ Let K denote either K or its 
derivative. Let Z be a random variable with 4th order moment, with m(x) = E[Z\X = x] 
twice continuously differentiable, with derivatives of order 0, 1 and 2 uniformly bounded. 
Consider, for d = 0, 1, and any vectors x and x' in X, 

m 1 ^ r> ( Q'X, - 9>x \ \ - ( d'X, - 9'x' y 

9n(e,x,x,d) = —T£k\—^)\k{ — - — j 

i=l x / i_ \ / 
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We have, for d = 0,1, 

sup \g n (9,x,x',d)-E[g n (9,x,x',d)}\ = P { n - l l 2 h^ d+ ^ 2 [\ogn} l l 2 ),{Q.2A) 



\x,x' 



sup \E[g n (6,x,x,d)]-E[Z\X = x}\ = 0(h 2 ), (6.25) 

e,x:f e (8'x)>c/2 

sup \E[g n (8,x,x,l)}\ = 0(1). (6.26) 

0,x,x' 

Corollary 6.4 Under Assumption UR 

\\r-f\\ = Opin^h-^llognf' + h 2 ), 
||V e r-V e /|| = P (n- l l 2 h^ 2 [\ogn} l l 2 + h 2 ). 

Proof. For the bias terms (I6.25j) and (I6.26p . this can be done by a classical change of 
variables, a Taylor expansion, and the fact that J uK(u)du = and J u 2 K{u)du < oo. 
For (16.241) . first consider 



if . . A i ^ ~ fe'Xi-e'x\ \~ fe'Xi-e'x'\ 



We then follow the methodology of Einmahl and Mason (2005). From Pakes and Pollard 
(1989), the family of functions indexed by (9,x,x',h) (which has a constant envelope 
function), 

'O'X-O'x^ 



[X,Z)^K 



h 



J'X - 6'x' 
h 



-Z<M n , 



satisfies the uniform entropy condition of Proposition 1 in Einmahl and Mason (2005) 
(condition (ii) in their Proposition 1). The other assumptions in their Proposition 1 
hold with (3 = a = CM, for some constant C not depending on M. We then can apply 
Talagrand's inequality (see Einmahl and Mason, 2005, and Talagrand, 1994), with Oq = 
n~ l / 2 h~^ d+l ^ 2 . Take M n = n 1 / 2 ^/ 2 . It follows from Talagrand's inequality that 

sup \g^(6, x, x', d) - E[g^(9, x, x' , d)\\ = P {n~ l l 2 h~^' 2 [\og n] 1 ' 2 ). 

9,x,x' 

It remains to consider g^ n —g n - This difference is bounded by Cn~ l h~^ l+d)i YH=\ \Zi\lz i >M n 
for some constant C. This is a sum of positive quantities, thus we only have to show that 
its expectation is op (n~ l l 2 h~^ d+l ^ 2 [log n] l l 2 ). For this, apply Holder inequality to bound 
this expectation by hr\ x ^E\Z^I^{Z > M n ,) 3 / 4 . Moreover, P(Z > M n ) < E[Z*}/M* 
from Tschebychev inequality, and the result follows. ■ 
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Proposition [63] below ensures that the difference between / and /*, in view of uniform 
consistency, is asymptotically negligible. Hence Assumption [H] can be deduced from the 
uniform consistency of /*. 

Proposition 6.5 Under Assumptions [Tfy [7j| and Kernel Assumptions [751 and lTb\ we 

have ii/ - nu + nv e ; - v r Hoc = 0P (i). 

Corollary 6.6 Under the Assumptions of Proposition \67b\ f satisfies Assumption^ and 
condition in AssumptionUR 

Proof. Let f 9 >x{u) = n^h^ 1 Y%=i K ((°' x i ~ u )/ h )- We have 

1 » [W m - W*}T t K ( e -^A 
/(«;0)-/*M) = -J2 j—rf L - (6-27) 

h i= i fe>x{u) 

Now, from uniform consistency of kernel density estimator (see, e.g. Einmahl and Mason, 
2005), 

sup \f 8 'x{0'x) - fo<x{0'x)\ = o P (l). 

Using this result on the set {fo>x{9'x) > c> 0}, and Lemma [6.21 ii) with rj sufficiently 
small, we obtain the bound 

\f(9'x;9)-f(9'x;9)\ < Op^h- 1 ) W^C^ 1 ' 2 ^ {T~)K ( , ) , (6-28) 

i=i ^ ' 

where the Op— rate does not depend on 9 nor x. Recalling the definition of W*, consider 

the family of functions indexed by 9 and x, 

{(T,6,X) -> ST[1 -G(T-)]- 1 C r >^ 2+e) (T-)K{(6'X -^/h)}. 

This family is Euclidean (see Lemma 22 in Nolan and Pollard, 1987) for an enveloppe 
ST[1 — G(T— )]~ 1 C" 7( - 1 / 2+e - ) (T— ) which is, for 77 = 1/2, square integrable from Assumption 
[TUl Therefore, using the assumptions on the bandwidth, 

sup I V W^C^^iT^K ( 9 ' X *- 9>X ) l = 0p {h) + P {n-^). 
x^xfiee ~^ \ n J 

Finally, back to (RT25jl . this shows that ||/ - fW^ = P {n~ l l A ) = o P (l). 
Similarly, \\V e f - V e f\\oo = Op^" 1 / 4 ^ 1 ). 
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Now, to prove the corollary, we have to show the uniform consistency of /* and V0/*, 
which can be done applying Theorem 2 in Einmahl and Mason (2005). ■ 

The following Proposition allows us to obtain that / satisfies conditions (14.141) and 
(I4.15P of Assumption | 



Proposition 6.7 Let \\ ■ ||e n denote the supremum of the absolute value over Q n . Under 
the Assumptions of Proposition l675\ we have 



WfJ^XiXTi - MXi, e ))(V e f(X i; 9) - VeF(X i; 9)) 



i=i 



J2w;j(e' x i )(f(e , x i ;e ) - r(9' x l ;9 ))(v e f(x l ;9) - v e r(x i; 9)) 



i=l 



Y,w:j{e' Q x i ){f{e' Q x i -e Q ) - r{9' Q x i -9 )){v e f{x i -9) - v,rpr i; 0)) 



i=l 



W*(f(9' X i] 9 ) - r(e' X i; 9 ))(VeK^i\ e ) ~ ^ef*{X,- 9)) 



i=l 



Or, 



--Opin- 1 ), 
(6.29) 

--0 P {n- v ), 
(6.30) 

(6.31) 

(6.32) 



Corollary 6.8 Under the assumptions of Proposition [675\ f satisfies conditions (f^. 1$ 
and ^4-ld\) of Assumption [751 



Proof of Corollary 16.81 To prove (I4.15p . according to Proposition 16.71 it remains 
to show that 



sup 

eee n 



W*J(9' X i )(f(9' X i] 9 ) - r(9' X t ; 9 ))(V e f(X i] 9) - V e f\X t] 9)) 



i=l 



which can be done following the lines of Lemma C2 in Delecroix & al. (2008). Similarly, 
Proposition (16. 7p allows to replace / by /*. ■ 

Proof of Proposition 16.71 We only prove (16.301) and (16.321) since the others are 
similar. 

We first prove ( I6.30p . This can be done by studying separately the different terms 
arising by differentiation with respect to 9 in the definition of /. We will only study the 
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term coming from the differentiation of the numerator (since the other is similar), that 
is 

1 - cm-) K — h — lUxVXiW-wM. 



nh 2 ^ 1 - G(Ti 

By bounding \K'\ by Hi^'Hoo and using the convergence rate of /*, it is easily seen that the 
terms for i = j can be removed from this double sum, arising an op{n~ l l 2 ) term uniform 
in 9. Applying (16. 27ft . we then get that the above quantity is, up to an op(n _1//2 )term, 

xk' ^' Xl ~ h d ' X] ) [w; - w 3n ]Tj e , x {e'x l r i h [)X {e',x i r\ 

Again, using Lemma IBT^l ii) with 77 = 1/2, and bounding K' by Hi^'Uoo allows us to remove 
the terms for j = k and i = k. For the rest of this triple sum, apply Lemma [6.21 ii) with 
77 = 1 and bound K' by Hi^'H^. If follows that the left-hand side of (I6.30p is bounded, 
uniformly in 9, by 

0p{n ~^ h ~ 2) Yl ^/^w i *c' 1 / 2+£ (T i -)C' 1 / 2 + £ (r fe -)|T i ||r fe |K f e 'o x i- e o x k \ 

The last sum as finite expectation (and does not depend on 9) from Assumption [TU1 

For (I6.32p . again, we will consider only the part of Vg/ coming from the differentiation 
of the numerator, this means that we are trying to bound 

l^wAV(^x J )[/(^x^ )-r(^x^o)]/r ( 9 ' Xi ~/ Xi ) T^-w^h'xie'Xi)- 1 . 

(6.33) 

First, let S T be the double sum deduced from (I6.33P by introducing ItjKt for some 
r < r H . From Gill (1983), sup t < r [Gf(t) — Gf(t) 1 1 1 — Gf(t) J -1 = Op^" 1 / 2 ), and consequently, 
supj \ W* — WjnllTjKr = Op(ri~ 1 / 2 ). Now, using the uniform convergence rate of /* and 
bounding K' by Hi^'Hoo shows that S T = Op(n^ 1 ^ 2 ) for any r < t#. To obtain a bound 
for f)6.33p . we then have to make r tend to th- For this, we use the same Cramer-Slutsky 
argument as Stute (1995) in his proof of the Central Limit Theorem under censoring. 
Using Lemma [6.21 ii) with 77 = 1 , observe that 

\s TH - s T \ < Opirc^h-'w - f\\oolJ2 K ( 9 ' oXi ~ d '° Xj ) i Tj> _ t w;c^{t 3 -)w:. 
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The last part does not depend on 6 and its expectation tends to zero as r — > r#, while 
the rest is Op(n -1 ' 2 ), using the convergence rate of /* and the Assumptions [TBI Then 
the Cramer-Slutsky argument of Stute allows us to conclude. ■ 

The only condition that still needs to be checked is that u — > /(it; Oq) G J 7 , where T 
is defined in Assumption [T2J This can be done if we specify this class of functions. If 
T = C 1 ($' X, M), it suffices to show that sup u \ f'(u;9 ) — f'(u;8 )\ = op(l), which can 
be done by using the same method as in Proposition 16.71 to replace / by /*. 

6.4 Trimming 

In the following proposition, we show that the trimming J n (6' n x) can be replaced by 
J(0'qX) modulo arbitrary small terms. 

Proposition 6.9 Let, for any function <f), 

1 n 

R n = -Y] <t>(°, G, /; Ti, 5 h Xi) [J{&M - Ue'nXi)] . 

We have R n = o P (± £? =1 G, /; 5,, X,)) op^" 1 /*) . 
Proof. For any 5 > 0, we have, with probability tending to one, 

\J(9' Xi) - J n (6' n Xi)\ < lf e , (e' x)<c-5,f eliX (e' n x)>c + Ms;oo}(Z n ) , 

where Z n = sup x \ fe' n x{6' n X) — fg' Q x(d' x)\J(x). As in Delecroix, Hristache, Patilea (2006) 
page 737-738, we have 

Note that F(n 1/2 Z n > 5) < F(Z n > 5), which tends to zero as 5 tends to zero. ■ 
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